TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model

نویسندگان

  • Olfa Nasraoui
  • Cesar Cardona Uribe
  • Carlos Rojas Coronel
  • Fabio A. González
چکیده

Artificial Immune System (AIS) models hold many promises in the field of unsupervised learning. However, existing models are not scalable, which makes them of limited use in data mining. We propose a new AIS based clustering approach (TECNO-STREAMS) that addresses the weaknesses of current AIS models. Compared to existing AIS based techniques, our approach exhibits superior learning abilities, while at the same time, requiring low memory and computational costs. Like the natural immune system, the strongest advantage of immune based learning compared to other approaches is expected to be its ease of adaptation to the dynamic environment that characterizes several applications, particularly in mining data streams. We illustrate the ability of the proposed approach in detecting clusters in noisy data sets, and in mining evolving user profiles from Web clickstream data in a single pass. TECNO-STREAMS adheres to all the requirements of clustering data streams: compactness of representation, fast incremental processing of new data points, and clear and fast identification of outliers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Evolving Web Clickstreams with Explicit Retrieval Similarity Measures

Data on the Web is noisy, huge, and dynamic. This poses enormous challenges to most data mining techniques that try to extract patterns from this data. While scalable data mining methods are expected to cope with the size challenge, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stoppages and reconfigurations is still an open challenge. This dynam...

متن کامل

Robust Clustering for Tracking Noisy Evolving Data Streams

We present a new approach for tracking evolving and noisy data streams by estimating clusters based on density, while taking into account the possibility of the presence of an unknown amount of outliers, the emergence of new patterns, and the forgetting of old patterns. keywords: evolving data streams, robust clustering, dynamic clustering, stream clustering, scalable clustering

متن کامل

Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm

Web usage mining has recently attracted attention as a viable framework for extracting useful access pattern information, such as user profiles, from massive amounts of Web log data for the purpose of Web site personalization and organization. These efforts have relied mainly on clustering or association rule discovery as the enabling data mining technologies. Typically, data mining has to be c...

متن کامل

A Model For The Residence Time Distribution and Holdup Measurement in a Two Impinging Streams Cyclone Reactor/Contactor in Solid-Liquid Systems

In this paper a two impinging streams cyclone contacting system suitable for handling of solid-liquid systems has been studied. Certain pertinent parameters such as: solid holdup, mean residence time and Residence Time Distribution (RTD) of solid particles have been investigated. A stochastic model based on Markov chains processes has been applied which describe the behavior of solid partic...

متن کامل

A New Mathematical Model for the Prediction of Internal Recirculation in Impinging Streams Reactors

A mathematical model for the prediction of internal recirculation of complex impinging stream reactors has been presented. The model constitutes a repetition of a series of ideal plug flow reactors and CSTR reactors with recirculation. The simplicity of the repeating motif allows for the derivation of an algebraic relation of the whole system using the Laplace transform. An impinging stream...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003